Bit-Parallel Approximate String Matching Algorithms with Transposition
نویسنده
چکیده
Using bit-parallelism has resulted in fast and practical algorithms for approximate string matching under the Levenshtein edit distance, which permits a single edit operation to insert, delete or substitute a character. Depending on the parameters of the search, currently the fastest non-filtering algorithms in practice are the O(kn!m/w") algorithm of Wu & Manber, the O(!km/w"n) algorithm of Baeza-Yates & Navarro, and the O(!m/w"n) algorithm of Myers, where m is the pattern length, n is the text length, k is the error threshold and w is the computer word size. In this paper we discuss a uniform way of modifying each of these algorithms to permit also a fourth type of edit operation: transposing two adjacent characters in the pattern. This type of edit distance is also known as the Damerau edit distance. In the end we also present an experimental comparison of the resulting algorithms.
منابع مشابه
Explaining and Extending the Bit-parallel Approximate String Matching Algorithm of Myers
The O( mn / w), where m is pattern length, n is text length and w is the computer word size, bit-parallel algorithm of Myers [6] is one of the best current algorithms in the case of approximate string matching allowing insertions, deletions and substitutions. We begin this paper by deriving a practically equivalent version of the algorithm of Myers. This is done in a way, which we believe makes...
متن کاملApproximate Multiple Pattern String Matching using Bit Parallelism: A Review
String matching is to find all the occurrences of a given pattern in a large text both being sequence of characters drawn from finite alphabet set. Approximate String Matching involves the detection of correct patterns along with the detection of some wrong patterns inside the text. Bit Parallelism is a feature that can be used to detect patterns inside the text and is reported to result in mor...
متن کاملImproved Two-Way Bit-parallel Search
New bit-parallel algorithms for exact and approximate string matching are introduced. TSO is a two-way Shift-Or algorithm, TSA is a two-way Shift-And algorithm, and TSAdd is a two-way Shift-Add algorithm. Tuned Shift-Add is a minimalist improvement to the original Shift-Add algorithm. TSO and TSA are for exact string matching, while TSAdd and tuned Shift-Add are for approximate string matching ...
متن کاملRestricted Transposition Invariant Approximate String Matching Under Edit Distance
Let A and B be strings with lengths m and n, respectively, over a finite integer alphabet. Two classic string mathing problems are computing the edit distance between A and B, and searching for approximate occurrences of A inside B. We consider the classic Levenshtein distance, but the discussion is applicable also to indel distance. A relatively new variant [8] of string matching, motivated in...
متن کاملSearching Monophonic Patterns within Polyphonic Sources
The string matching problem for strings in which one should find the occurrences of a pattern string within a text, is well-studied in the past literature. The problem can be solved efficiently, e.g., by using so-called bit-parallel algorithms. We adapt the bit-parallel approach to music information retrieval. We consider a situation where the pattern is monophonic and the text (the musical sou...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- J. Discrete Algorithms
دوره 3 شماره
صفحات -
تاریخ انتشار 2003